An Annotated Corpus of Typical Durations of Events
نویسندگان
چکیده
In this paper, we present our work on generating an annotated corpus for extracting information about the typical durations of events from texts. We include the annotation guidelines, the event classes we categorized, the way we use normal distributions to model vague and implicit temporal information, and how we evaluate inter-annotator agreement. The experimental results show that our guidelines are effective in improving the inter-annotator agreement.
منابع مشابه
Extracting and modeling durations for habits and events from Twitter
We seek to automatically estimate typical durations for events and habits described in Twitter tweets. A corpus of more than 14 million tweets containing temporal duration information was collected. These tweets were classified as to their habituality status using a bootstrapped, decision tree. For each verb lemma, associated duration information was collected for episodic and habitual uses of ...
متن کاملLearning Event Durations from Event Descriptions
We have constructed a corpus of news articles in which events are annotated for estimated bounds on their duration. Here we describe a method for measuring inter-annotator agreement for these event duration distributions. We then show that machine learning techniques applied to this data yield coarse-grained event duration information, considerably outperforming a baseline and approaching human...
متن کاملAnnotating and Learning Event Durations in Text
This article presents our work on constructing a corpus of news articles in which events are annotated for estimated bounds on their duration, and automatically learning from this corpus. We describe the annotation guidelines, the event classes we categorized to reduce gross discrepancies in inter-annotator judgments, and our use of normal distributions to model vague and implicit temporal info...
متن کاملA New Twitter Verb Lexicon for Natural Language Processing
We describe in-progress work on the creation of a new lexical resource that contains a list of 486 verbs annotated with quantified temporal durations for the events that they describe. This resource is being compiled from more than 14 million tweets from the Twitter microblogging site. We are creating this lexicon of verbs and typical durations to address a gap in the available information that...
متن کاملAnnotating events, Time and Place Expressions in Arabic Texts
We present in this paper an unsupervised approach to recognize events, time and place expressions in Arabic texts. Arabic is a resource –scarce language and we don’t easily have at hand annotated corpora, lexicons and other needed NLP tools. We show in this work that we can recognize events, time and place expressions in Arabic texts without using a POS annotated corpus and without lexicon. We ...
متن کامل